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ABSTRACT 

The yeast two-hybrid (Y2H) system is the most 
widely applied methodology for systematic 
protein-protein interaction (PPI) screening and the 
generation of comprehensive interaction networks. 
We developed a novel Y2H interaction screening 
procedure using DNA microarrays for high- 
throughput quantitative PPI detection. Applying a 
global pooling and selection scheme to a large col- 
lection of human open reading frames, proof-of- 
principle Y2H interaction screens were performed 
for the human neurodegenerative disease proteins 
huntingtin and ataxin-1. Using systematic controls 
for unspecific Y2H results and quantitative bench- 
marking, we identified and scored a large number of 
known and novel partner proteins for both 
huntingtin and ataxin-1. Moreover, we show that 
this parallelized screening procedure and the 
global inspection of Y2H interaction data are 
uniquely suited to define specific PPI patterns and 
their alteration by disease-causing mutations in 
huntingtin and ataxin-1. This approach takes advan- 
tage of the specificity and flexibility of DNA micro- 
arrays and of the existence of solid-related 
statistical methods for the analysis of DNA micro- 
array data, and allows a quantitative approach 
toward interaction screens in human and in model 
organisms. 



INTRODUCTION 

Networks of protein-protein interactions (PPIs) underlie 
all cellular processes and are highly predictive for func- 
tional relationships among gene products. Consequently, 
one of the principal goals in modern systems biology is the 
generation of comprehensive maps for PPIs in human and 
model organisms (1). The most important tool for system- 
atic mapping of binary PPIs is the well-established yeast 
two-hybrid (Y2H) methodology (2). In the classical imple- 
mentation of the Y2H system, a split transcription factor, 
consisting of activation and DNA-binding domains, is 
functionally reconstituted via the physical interaction of 
bait and prey proteins (3). The reconstituted hybrid tran- 
scription factor drives the expression of reporter genes 
that are scored by growth and color phenotypes (typically 
HIS3 and lacZ). Traditionally, a specific bait protein is 
combined with a cDNA library encoding prey fusion 
proteins, and interacting bait-prey combinations are 
identified from yeast colonies that are grown on selective 
agar plates. Crucial for the generation of entire protein 
interactome networks have been matrix-based Y2H 
screening procedures using libraries of annotated open 
reading frames (ORFs). These have been applied for the 
exploration of PPI networks in eukaryotic model organ- 
isms, such as Saccharomyces cerevisiae (4,5), Drosophila 
melanogaster (6), Caenorhabditis elegans (7), and also for 
a first overview of the human interactome (8,9). Moreover, 
a number of other screens focused on specific 
disease-causing proteins, and signaling pathways were per- 
formed to obtain increased depth and coverage of relevant 
PPI networks (10-13). 
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So far, Y2H data have been reported as reproducible 
outcomes from repeated interaction screens and are not 
based on quantitative measurements, which contrasts with 
gene expression and protein-DNA interaction data that 
have been extensively addressed with DNA microarrays 
(14). The DNA microarray technology has also been in- 
strumental in other applications, such as the high- 
throughput screening and quantitative measuring of 
drug sensitivity and resistance of yeast deletion strains 
(15,16). For these experiments, large populations of 
yeast strains comprising thousands of barcoded deletions 
are grown in the presence of diverse chemical compounds. 
The barcodes from compound-treated pools and un- 
treated control pools are amplified by polymerase chain 
reaction (PCR) and hybridized to DNA microarrays to 
score deletion strains that are under- or overrepresented 
after selection. The same strategy is followed with pools of 
yeast cells that overexpress large collections of ORFs (17). 
A large number of template ORFs of different sizes can be 
PCR amplified in one pooled reaction when using a primer 
set that anneals to adjacent vector sequences. 

Here, we apply a novel Y2H screening scheme that is 
based on pooling and competitive growth on selective 
plates. For proof-of-principle experiments, we explored 
PPI networks for the neurodegenerative disease proteins 
huntingtin (HTT) and ataxin-1 (ATXN1), which, as 
mutant variants, cause Huntington's disease (HD) and 
spinocerebellar ataxia type 1 (SCA1), respectively 
(18,19). Both proteins contain polyglutamine tracts that, 
on expansion to a pathological length, cause protein 
misfolding and aggregation in neuronal cells. PPI 
networks for HTT and ATXN1 have already been 
generated previously with high-throughput Y2H screens 
(11-13). It was suggested previously that the underlying 
function of polyQ tracts in proteins is to mediate PPIs and 
that alterations in PPI patterns due to polyQ expansions 
are important for disease pathogenesis (20,21). 

By screening the bait proteins HTT and ATXN 1 against 
a large preassembled library of ORFs, we achieved an 
unprecedented throughput and parallelization of the 
Y2H procedure. Quantitative benchmarking and receiver 
operation characteristics (ROC) via repeated sampling 
revealed the distribution of known PPIs among the micro- 
array scores and determined the empirical cutoffs for high- 
confidence PPIs. For HTT, a larger number of PPIs 
identified by microarray Y2H screening were further con- 
firmed by LUminescence-based Mammalian intERactome 
mapping (LUMIER) co-immunoprecipitation assays. 
Importantly, the interpretation of Y2H interaction 
results as large sets of numerical scores not only allowed 
a systematic sampling for true positive results but also the 
exclusion of false positives. In addition, gene ontology 
(GO) term enrichment analysis predicted the functional 
involvement of HTT and ATXN1 in different cellular 
compartments and molecular functions, such as the in- 
volvement in cellular signaling pathways and protein 
binding. The screening approach presented here could be 
applied more broadly for the systematic mapping of 
human PPIs and to examine the effects of disease-specific 
mutations on PPI networks. 



MATERIALS AND METHODS 

Yeast strains and Y2H matrix 

Individual bait strains were constructed by cloning DNA 
sequences encoding the huntingtin fragments HD506-Q23, 
ATXN1-Q32 and ATXN1-Q79 into Y2H plasmid 
pBTM116-D9, derived from pBTM116 (Clontech). Baits 
selected to perform the Y2H screens are the N-terminal 
fragment of the HD protein with a short polyglutamine 
tract (HD506-Q23), wild-type ataxin-1 (ATXN1-Q32), 
and mutant ataxin-1 that has an elongated polyglutamine 
tract (ATXN1-Q79). The bait constructs were trans- 
formed into yeast strain L40ccua (Mata). Identity of the 
individual bait clones was confirmed by PCR. The prey 
ORFs are in vector pACT4-DM (derived from pACT2, 
Clontech) and grown in strain L40cca (MATa). Plasmid 
constructs and shuttling procedures are described else- 
where (22). 

Composition of the ActMat v3 matrix 

A large matrix of prey strains (ActMat collection, version 
3) containing full-length ORFs was constructed by 
recombinational cloning using the Gateway system 
(Invitrogen). The ActMat v. 3 is an expanded version of a 
prey matrix described earlier (8), containing a total of 
14119 full-length ORF clones from four different re- 
sources. The full MGC3 collection (23) comprises ca. 
80% of all clones, and additional clones are from 
Harvard, SMP and RZPD clone repositories, as well as a 
collection of ORFs that were assembled in our lab (1-22 
collection) (Supplementary Table SI). From the assembled 
clones, 13 405 ORFs are Entry clones that were transferred 
into pACT4-DM via Gateway LR-reactions. These entry 
clones in turn correspond to 1 1 083 unique Entrez GenelDs 
according to NCBI annotations, and 11 685 unique gene 
annotations in the Ensembl v. 58 release. Comparing the 
annotations in the ActMat v. 3 collection with the probesets 
on the ST1 .0 array that were mapped with the Ensembl v. 58 
release, we identified 10929 corresponding Ensembl gene 
IDs (10 500 Entrez GenelDs). 

Pooling and selective growth 

The ActMat v. 3 strains were arrayed in 384-well microtiter 
plates and stamped out onto seven minimal medium 
Omnitrays. Arrayed strains were grown in 45 ul SD-Leu 
(384-well plates) until saturation and stamped out on 
Omnitrays containing SD-Leu agar medium using a KBio 
Systems K4 robot and grown for 2 days at 30°C. Freshly 
grown yeast strains were then washed off with SD-Trp 
medium (containing 10% glycerol), pooled and 
concentrated to 50 or 100 optical densities at 600 nm 
(OD 6 oo) per ml to generate pool aliquots. For each 
mating reaction, 20 ul of the concentrated stock, roughly 
corresponding to one OD 600 (1-2 x 10 7 cells), were used. 
The amount of 1—2 x 10 7 cells per OD 600 covers the com- 
plexity of the library (ca. 14000 ORFs in the pools) several 
100-fold. 

For the bait screening procedure, freshly grown cul- 
tures (5 ml) of the bait strain were concentrated in 
a small volume to a maximum density of 50ODs/ml. 
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The concentrated bait strain was then combined with a 
prey pool aliquot in a 1:1 to 2:1 ratio and thoroughly 
mixed. Then, 10-20 ul of the mix was spotted on yeast 
extract peptone glucose (YPD) agar medium and 
incubated for 24 h at 30°C to allow for sufficient mating. 
To control mating efficiency, a small amount of cell 
material was diluted 1:20000 in liquid SDIV (SD-His- 
Leu-Trp-Ura), and plated on SDII (SD-Leu-Trp), 
SD-Leu and SD-Trp plates for mating control. Because 
mating efficiency of the prey strains is generally dimin- 
ished with every freeze-thaw cycle, prey pools were 
stored in 1 ml aliquots for single use. For the preselection 
of diploids, the mated cells were transferred onto 
SDII-agar medium using an incubation loop and 
incubated for 24 h at 30°C. Pools enriched for diploid 
cells were diluted to an OD 60 o of 0.05 in 5 ml SDIV 
medium and grown at 30° C with 250 rpm. For four of 
the screens (with HTT and pBTM), inoculation and 
culture volumes were 5 times the inoculation 
(OD 6 oo — 0.25) and/or 10 times the volume (50 ml). 
After reaching an OD 60 o of 2-3 (ca. 48 h), cells were 
diluted back into a fresh culture to OD 6 oo = 0.05 for a 
second round of selection (ca. 24 h). Considering the 
total incubation in SDIV medium and an average gener- 
ation time of ~3.5h, the selection of the HIS3 reporter 
activation was expected to last for ca. 20-24 generations. 

Preparation of DNA microarrays 

DNA was extracted from 1 ml of each final SDIV culture 
using the Zymoprep II Yeast Plasmid Miniprep protocol 
(Zymo Research). To measure the full representation of all 
ORFs in ActMat v. 3, an equivalent amount (~2 x 10 7 
cells) was extracted from the original pooled cells. To 
increase the yield of plasmid DNA elution from the 
column, 20 ul bdH 2 0 was incubated in the column for 
2min before spinning into a new Eppendorf tube. The 
population of cloned ORFs in prey plasmid pACT4 was 
selectively amplified with primers pACT4-5-P3 (5'-TGC 
GGG GTT TTT CAG TAT CTA-3') and pACT4-3-P4 
(5'-ATG ATG AAG ATA CCC CAC CAA A-3') using 
the Expand High Fidelity PCR kit (Roche). The PCR 
reaction (50 ul) contained one-tenth of the eluted DNA 
(2 ul), 300 nM of each primer, 200 uM of dNTP mix and 
2.6 U/reaction Expand High Fidelity enzyme. Routine 
PCR reaction was lOmin initial denaturation, followed 
by 35 cycles of amplification (30 s 95°C, 45 s 55°C and 
5min 68°C), and lOmin final elongation at 68°C. The 
amplified DNA was measured with a NanoDrop spectro- 
photometer and checked on 1% agarose gels 
(Supplementary Figure SI). 

PCR products were then purified with the PCR purifi- 
cation kit (Invitrogen) according the instructions of the 
manufacturer. Elution was done in 50 ul water after 
1 min incubation. Half of the amplified PCR product 
(25 ul) was biotin labeled with the BioPrime DNA 
labeling system. Labeling was done according to the spe- 
cifications of the manufacturer (Invitrogen). The total 
volume of 50 ul from the labeling reaction was used for 
microarray hybridization. 



Affymetrix microarray processing 

Hybridization to Affymetrix human gene ST 1.0 arrays was 
done according to the WT Sense Target Labeling Assay 
Manual with minor modifications. Prehybridization was 
done for 10 min at 45°C and 60 rpm in the hybridization 
oven. The total hybridization mix (150 ul) contains ~50 ul 
total labeled DNA, 3nM B2-01igo, lx control RNAs 
(bioB, bioC, bioD, ere), lx hybridization buffer and 
7.5% dimethyl sulfoxide. The mix (150 ul) was denatured 
for 2 min at 95°C and stored for 2 min on ice to fill the array 
chamber completely. Washing and staining was done ac- 
cording to the Affymetrix Eukaryotic Antibody Staining 
protocol (protocol FS450-0007), but without the signal 
amplification using the biotinylated antibody. In the 
modified protocol, stain 1 contained streptavidin 
phycoerythrin (SAPE) solution [lx 2-(N- 
morpholino)ethanesulfonic acid (MES) stain buffer, 
2mg/ml acetylated bovine serum albumin and lOug/ml 
streptavidin-phycoerythrin], stain 2 consisted only of lx 
MES stain and 2mg/ml bovine serum albumin and stain 
3 was 800 ul array holding buffer. See Supplementary in- 
formation for microarray analysis. 

LUMIER assay 

LUMIER was developed as a comprehensive mammalian 
interactome screening strategy (24). Here we apply a 
modified version as a validation assay for PPI results. 
Protein A (PA)-Renilla luciferase (RL)-tagged fusion 
proteins were co-expressed with firefly luciferase (FL)- 
V5-tagged interactor proteins in HEK293 cells. After 
48 h, protein complexes were co-immunoprecipitated 
from 70 ul cell extracts in IgG coated beads and subse- 
quently washed with 100 ul phosphate buffered saline; 
interactions between bait (PA-RL) and prey proteins 
(FL fusions) were monitored by quantification of FL 
activities. Quantification of RL activity was used to 
confirm that PA-RL-tagged bait protein is successfully 
immunoprecipitated from cell extracts. To detect RL- 
and FL-based luminescence, Dual-Glo Luciferase Kit 
(Promega) was used. Bioluminescence was quantified in 
a luminescence plate reader (TECAN Infinite Ml 000). 

For each protein pair, three co-immunoprecipitation ex- 
periments (Co-IPs) were performed in parallel (see 
Supplementary Figure S4). PA-RL and FL without a 
fusion protein were used as controls to examine back- 
ground protein binding. After 48 h, protein complexes 
were co-immunoprecipitated in IgG-coated beads. By 
comparing the firefly luminescence activity measured in 
the Co-IP with the two fusion proteins with the controls, 
the R-op and R-ob binding ratios were obtained, which 
are a measure for the protein interaction specificity. Based 
on well-characterized interaction test pairs, an interaction 
was defined as positive when the calculated R-op and 
R-ob ratios were >1.25 and >2, respectively. 

Bioinformatics 

Quantitative scoring of the microarray data for known 
positive sets (literature) was performed using the 
QiSampler application (25). Literature interactions for 
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binary classification in the QiSampler procedure were 
derived from the vl.2 release of the Human Integrated 
Protein-Protein Interaction rEference (HIPPIE) 
database (26), available at http://cbdm.mdc-berlin.de/ 
tools/hippie. Sets of genelDs, associated with specific 
GO-terms were downloaded from the Gene Ontology 
browser AmiGo (http://amigo.geneontology.org). 
Network graphs are drawn with Cytoscape 2.7.0. Venn 
diagrams were constructed with BioVenn online tool 
(http://www.cmbi.ru.nl/cdd/biovenn). GO-term enrich- 
ment was determined using the human Consensus Path 
Database (CPDB; http://cpdb.molgen.mpg.de) (27,28). 
For CPDB analysis, P<0.01 were considered significant 
for the enrichment for a pathway or a functional group 
defined by GO. The analysis of the microarray data is 
described in the Supplementary information. 

RESULTS 

To apply DNA microarrays as a quantitative readout for 
PPI detection, we implemented a global mating and selec- 
tion scheme for Y2H interaction screens (Figure 1). An 
arrayed collection of ~14000 ORFs in Y2H prey vectors 
(ActMat v.3; Supplementary Table SI) was pooled and 
small aliquots were used for mating reactions with bait 
constructs or the empty vector control (pBTM). As baits 
for interaction screening, a wild-type N-terminal HTT 
fragment (HD506-Q23) as well as both wild-type and 
mutant elongated ATXN1 (ATXN1-Q32 and ATXN1- 
Q79) were used. Mated diploid yeast cells were grown 
under selective conditions for HIS3 reporter gene activa- 
tion, and plasmid DNA was isolated from selected and 
non-selected samples. Prey ORFs were amplified with a 
primer pair in the prey vector that flanks the recombination 
sites, yielding PCR products over the full range of expected 
sizes (Supplementary Figure SI). PCR products were then 
biotin labeled and hybridized to Affymetrix ST1.0 DNA 
microarrays. The hybridization signals were characterized 
with a number of parameters that measure the enrichment 
of bait-prey combinations (ratios) and different statistical 
tests (e.g. P-value and q-value Wilcoxon) to determine the 
significance of results and the screen-to-screen variations 
(Supplementary Tables S2 and S3). For any represented 
genelD, the hybridization signals in the 'bait' screens, rep- 
resenting the preys selected in combination with a defined 
bait, were compared with two sets of 'control' samples: the 
original unselected pooled prey collection (Pool), which 
represents the background signal for a given library, and 
the empty bait plasmid (pBTM) control, which reports 
reporter activation in the absence of a functional bait. 
Hence, whereas the ratio poo i quantifies the (initial) Y2H 
interaction selection, the ratio pB TM displays the difference 
of the specific 'bait' selection to the unspecific self-activation 
of prey ORFs that interact with the DNA binding domain of 
the vector in the absence of a bait protein. 

Quantitative benchmarking of Y2H screening results 

Screening of the ActMat v.3 library with HD506-Q23 in 
nine replicates revealed 9888 ORFs as 'present' on the 
microarray via detection calls or background tags 



(Supplementary Table S4). Moreover, the application of 
background tags and median probeset signals displayed a 
rather efficient amplification of ca. 90% of all ORFs and 
the absence of major biases and PCR artefacts 
(Supplementary Figure S2). In a second step, we found 
that differential enrichment of 2638 ORFs in the pool 
and pBTM comparisons was significant after multiple 
testing (q-value Wilcoxon <0.05). Through the applica- 
tion of this threshold, screen-to-screen variability is 
taken into account to determine bait-specific enrichment 
of ORFs compared with pool and pBTM controls 
(Supplementary Figure S3). Third, for a primary 
network analysis, low arbitrary cutoffs for bait-specific 
activation were set at log2-ratio >0.6 (ratio >1.516), 
which identified in total 224 preys in the pool and 111 
preys in the pBTM comparison (Figure 2A). The restric- 
tion to 88 ORFs in the overlap between ratio poo i and 
ratio pBTM scores excludes potential false positives, espe- 
cially those from bait-independent reporter gene activa- 
tion (see Materials and Methods section and 
Supplementary information for all technical descriptions). 

With the presumption that the occurrence of known 
positives increases the confidence in the overall screening 
results, Y2H interaction results are commonly bench- 
marked against a dataset of known literature interactions. 
We recently developed QiSampler, a statistical tool that 
allows the comparison of numerical scores (such as the 
ratios from Y2H microarrays obtained here) with binary 
classifiers using a repetitive random and balanced 
sampling strategy (25). The primary source of binary clas- 
sifiers for the sampling analysis of Y2H interaction micro- 
array data were known literature interactions in the 
HIPPIE database, a comprehensive collection of human 
PPIs with experiment-based quality scores (26). Control 
samplings were done with random sets from all other 
preys that are not contained in the HIPPIE dataset. 
Because PPIs for HTT were previously explored, notably 
also with Y2H and co-precipitation assays in 
high-throughput experiments (11,12), a rather large collec- 
tion of 289 HTT interactions is available in the HIPPIE 
database, with 79 PPIs among the subset of filtered ORFs 
(2638) that were significant after multiple testing. Using 
the known PPIs as binary classifiers for the filtered HTT 
dataset in the QiSampler procedure, a high precision and a 
relatively low recall were found at increasing ratio cutoffs 
(Figure 2C). Moreover, the ROC curves displayed a clear 
discrimination for both ratio pool and ratio pBTM with 
respect to the diagonal representing randomness (area 
under the ROC curve equal 0.611 for ratio poo i and 0.612 
for ratio P BTM)- The distinct quantitative effect with both 
ratios reflected, therefore, a specific enrichment of known 
HTT interactors with the HD506-Q23 bait protein. 

Besides an overall characterization of the screening 
outcome, we sought to use the quantitative benchmarking 
procedures to estimate the ideal cutoff for high-confidence 
PPIs, based on the distribution of known positives across 
the entire range of ratio poo i and ratio pBTM scores. 
Estimations for an ideal cutoff aim at the inclusion of 
the most possible true positives, while avoiding the inclu- 
sion of false positives (see Supplementary information for 
cutoff determination). Confronted with a high precision 
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Figure 1. Pooling and selective growth strategy for microarray Y2H screening. All strains of the Y2H prey collection (ActMat v. 3 in Mata) with 
~ 14 000 full length ORF clones are combined into one large pool and aliquots are mated with Mata Y2H strains containing selected bait constructs 
(HD506-Q23, ATXN-Q32/-Q79) or empty vector control (pBTM116-D9) in 1:1 ratio. Because only a few prey ORFs are enriched after the Y2H 
procedure by HIS3 activation, the resulting pools differ in their composition from the original pool. Selection of preys is either bait-dependent (blue) 
or unspecific (red). After plasmid extraction and PCR amplification of the prey inserts, biotin-labeled PCR products are hybridized to Affymetrix 
ST1.0 microarrays. Using model-based analysis of tiling arrays (MAT), the hybridization results from the bait screens are compared with the pool 
control (pooled preys without selection or bait; Pool) and with the screening control (a selection screen with empty vector; pBTM). The first 
comparison shows the enrichment of interacting preys. The second comparison shows enrichment that is specific to the use of the bait. 



and low recall in sampling of Y2H interaction screening 
data, we emphasized on precision as the major determin- 
ant for the cutoff selection. A modified version of 
QiSampler allowed automated cutoff computation based 
on F-measurement (harmonic mean of the precision and 
the recall), with adjustment through the alpha (a) coeffi- 
cient. After testing various a coefficients, we settled for a 
4-fold emphasis of precision over recall, corresponding to 
a = 0.94114. This resulted in cutoffs at log2-ratio- 
pooi = 1-68 and ratio pB TM = 1-578, roughly corresponding 
to the log2-ratios at which 90% precision is reached 
(Figure 2B). Using this approach, 44 prey ORFs were 
found as interaction partners for HD506-Q23 in the 
overlap between pool and pBTM comparisons, which is 
a significant result (P = 1.1 x 10~ 42 for one-sided Fisher's 
exact test). Compared with the low arbitrary cutoff (see 



Figure 2A), 14 out of the 15 known positive PPIs were 
retained, while the total overlap was narrowed by 50%. 
Sampling also further eliminated 90% of potential false- 
positive interactions (ratio pool only). Hence, repeat 
sampling defined a set of 44 high-confidence PPIs for 
HD506-Q23 that result from bait-specific Y2H interaction 
selection. 

A high-confidence microarray-Y2H interaction network 
for HD506-Q23 

For network representation, the PPIs with HTT were dis- 
played according to ratio and q-value Wilcoxon param- 
eters (Figure 3 A and B). High ratios and low q-values in 
the pool comparison reflected strong and reproducible 
reporter gene activation, whereas the pBTM comparison 
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Figure 2. Quantitative benchmarking and cutoffs for HD506-Q23 screens. (A) Venn diagram for overlap of log2-ratio scores at arbitrary cutoff 
(log2-ratio >0.6), ratio poo | (blue), ratio pBTM (red) and literature data (HIPPIE, green). (B) Venn diagram for overlap of log2-ratio scores at auto- 
mated cutoffs based on repeat sampling with HIPPIE classifiers, log2-ratio pool (1.68) and ratio pBXM (1.578). Evaluations are based on 2638 
high-confidence results (GenelDs) after multiple testing (q-value Wilcoxon <0.05), including 79 known positives from the HIPPIE database. 
(C) Quantitative benchmarking and determination of automated cutoffs using QiSampler. Log2-ratios are the numerical scores and HIPPIE positives 
as binary classifiers. Precision, recall, F-measurements and ROC are displayed as repeat sampling curves. F-measurement is adjusted (fj = 0.25, 
a = 0.94114). For ROC, area under the ROC curve (ROC AUC) is calculated. Sampling with HIPPIE positive set: log2-ratio poo i (closed blue line), 
log2-ratio pBTM (closed red line). Control sampling with random set (no discrimination): log2-ratio poo i (dashed blue line), log2-ratio pBXM (dashed red 
line). Positives correspond to HIPPIE and negatives to non-HIPPIE PPIs. Sampling is done in 1000 repetitions, with a rate of 0.5 (50% of classifiers 
sampled per run). 



measured the specificity of the reporter activation with 
respect to the empty vector control. Importantly, about 
one-third of the high-confidence HTT interactions were 
known HIPPIE positives (13 and HTT self-interaction), 
which was also reflected in the high precision for the 
QiSampler (see Figure 2C). Known positives among the 
highest-scoring HD506-Q23-interacting proteins included 
optineurin (OPTN), palmitoyltransfease ZDHHC17 
(HIP 14) and, importantly, also enzymes with roles in the 
ubiquitin cycle, with activation (UBAC1), conjugation 
(UBE2K) and ligation (RNF20) (11,12,29). 



We applied a modified version of the LUMIER method 
as an orthogonal PPI confirmation assay (Supplementary 
Figure S4). Baits tagged with protein-A and RL were co- 
expressed with FL-tagged prey proteins (30). To test for 
interactions, 73 candidate proteins were chosen based on 
the low arbitrary cutoff (not including 1 5 known positives) 
and co-expressed as bait and/or preys in HEK293 cells. 
Interactions were detected by quantification of FL lumi- 
nescence from co-immunoprecipitated protein complexes. 
In total, with HD506-Q23 either in bait or prey orienta- 
tion, 31 of the tested Y2H interactions (42%) were 
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confirmed by LUMIER assays (Supplementary Table S5, 
Figure 3C). Among the high-confidence interactions, 27 
out of the 44 PPIs (61%) were either confirmed with 
LUMIER or were HIPPIE positives (Figure 3A and B). 
If the cutoffs were further raised (log2-ratios >3), the 
overall precision is even higher with 14 out of 18 Y2H 
PPIs (78%) confirmed by LUMIER or HIPPIE. Hence, 
in general, the significance and enrichment of the micro- 
array signals correlate well with confidence for genuine 
PPIs. 

We further inspected five intriguing novel HTT inter- 
actions among the highest ratio scores (all ratios >15) in 
more detail: ERCC6L, EVL, HMG20A, PIAS1 and 
ZNF451. HMG20A is part of the high-mobility group 
proteins, EVL is an Ena/VASP family protein that 
links cell signaling to remodeling of the actin cytoskel- 
eton (31) and ERCC6L is a member of the SNF2/ 
RAD54 helicase family with a role in DNA repair (32). 
The other two proteins have roles in protein sumoylation; 
PIAS1 (Protein inhibitor of STAT) is a Sumo ligase and 
ZNF451 a transcriptional co-regulator associated with 
PML bodies and Sumo (33). The associations of 
ERCC6L, EVL and HMG20A with HTT were confirmed 
in LUMIER assays, whereas those with PIAS1 and 
ZNF451 were not. We used the HIPPIE database for 
further evaluation of these five proteins, looking for 
co-complex formation with previously identified HTT 
interactors as an indirect evidence for association 
(Figure 3D). We found that EVL shares four partners 
out of 22 with HTT, including the actin monomer- 
binding protein profilin-2 (PFN2), and a spectrin 
protein involved in actin crosslinking (SPTAN1), which 
is consistent with a functional involvement of HTT in 
actin remodeling (34). ZNF451 shares 3 out of its 13 
known partners with HTT. These include also a 
shared interactor with EVL, the pre-mRNA processing 
factor 40 (PRPF40A), which was originally discovered as 
an HTT interacting protein (HIP10) (35). In addition, 
ZNF451 is linked to PIAS1 (33), which also shares 
20% of its known partners with HTT, further 
corroborating the association of HTT with the 
sumoylation machinery. This is consistent with the 
observed regulation of HTT stability by sumoylation 
and ubiquitination (36). 

For a functional analysis of PPI networks, we relied 
on a dual strategy: hypergeometric testing for 
overrepresentation of pathways and GO above 
determined cutoffs, and deep sampling of selected 
gene associations for true enrichment over the entire 
range of scores. For a global overview and functional 
enrichment analysis, it is preferable to increase the sen- 
sitivity using less stringent cutoffs, expanding also to 
less significant results (q-value Wilcoxon >0.05). 
When doing cutoff sampling and gene overrepre- 
sentation analysis for the total HD506-Q23 screening 
data (Supplementary Figure S5 and Table S6), the 
results are consistent with the multiple roles for HTT 
as a hub for PPIs and diverse functions such DNA 
binding, signaling and binding to ubiquitin-proteasome 
components (11,12,21). 



Functional analysis of wild-type and mutant ATXN1 PPI 
networks 

A major strength of the microarray-based Y2H method is 
the comprehensive readout of the total screening results, 
which allows the side-by-side comparison of PPI profiles 
for mutant and wild-type bait proteins. We compared the 
PPI patterns for the bait proteins ATXN1-Q32 and 
ATXN1-Q79 containing a non-pathogenic and a patho- 
genic polyQ tract, respectively (Figure 4). Inspecting the 
PPI data obtained for the two bait proteins revealed that 
ATXN1-Q79 interacts with two to three times more prey 
proteins than ATXN1-Q32 (Supplementary Table S7). 
Applying quantitative benchmarking for all 9941 
ATXN1-Q79 scores with 109 known HIPPIE positives, 
we found a relatively stronger performance for the 
expanded ATXN1-Q79 protein (ROC-values: 0.596 and 
0.585), whereas the performance for the short 
ATXN1-Q32 form was closer to random (ROC-values: 
0.554 and 0.520) (Supplementary Figure S6). 
Considering the bias of known positives, automated 
cutoffs were generated only from the ATXN1-Q79 PPI 
data (log2-ratio pool = 1.728, log2-ratio pB TM = 1.329; 
a = 0.99) (Figure 4A), but then were also applied to the 
ATXN1-Q32 screen set (Figure 4B). We found that eight 
known positive interactions were among the ATXN1-Q79 
PPIs, whereas for ATXN1-Q32, only one known prey 
protein (ARID5A) was selected. In total, 64 PPIs were 
found for the expanded ATXN1-Q79 and 24 for the 
short -Q32 form with a significant overlap of 7 inter- 
actions (P-value Fisher exact test: 1.5 x 10~ 9 ) 
(Figures 4B and C). Hence, the results for ATXN1-Q32 
and -Q79 differ in respect with the overall yield of scores 
and the enrichment of known literature positives. A 
possible explanation for the increased number of inter- 
action partners observed with mutant ATXN1 is 
provided by the notion that the expanded glutamine 
tract alters the conformation of ATXN1 and may 
promote the formation of abnormal PPIs with multiple 
cellular proteins (20,37). But it also might enhance the 
strength of interaction with partners of the wild-type 
form, leading to an increased detection of true biological 
positives. 

In the functional enrichment analysis for ATXN1-Q32 
and ATXN1-Q79, we found overrepresentations of differ- 
ent signaling pathways and several interesting targets 
(Supplementary Table S6). Indeed, at least 10 out of 81 
proteins detected in the ATXN1 screens take part in one 
or several signaling pathways, such as Lkbl, IFNy, IGF1, 
mTOR and more others (enrichment for Lkbl pathway: 
P = 4.6 x 10 4 for -Q79 and P = 2.6x 10 5 for -Q32). 
Examples for signaling proteins that were found with 
both isoforms include the signal transducing adaptor 
molecule 2 (STAM2), the mTOR associated protein 
LST8 homolog (MLST8) and the hamartin protein 
TSC1 (Figure 4C). We also found association with 
14-3-3 proteins (YWHAE, YWHAZ and YWHAQ 
above or slightly below the chosen cutoffs), which are 
known modulators of ATXN1 -mediated neuro- 
degeneration (38), confirming previously published 
results (see Supplementary Table S7). For both wild-type 
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and mutant ATXN1 isoforms, we found GO-terms 
enriched that are related to neuronal cell growth and 
brain development, such as 'growth cone', 'pallium devel- 
opment', and 'neuron projection', suggesting that ATXN1 
function is critical for these processes. For example, genes 
among the high confidence scores associated with 'growth 
cone' included TSC1, orthodenticle homeobox 2 (OTX2), 
brain acid soluble protein 1 (BASP1) and the neuronal 
acetylcholine receptor subunit alpha-7 (CHRNA7) 
(Figure 4C). Overall, the GO analysis suggests a role for 
ATXN-1 in cell signaling and neuronal functions. 

Sampling the distribution of gene sets with QiSampler 
using GO-annotated genes instead of literature-positive 
interactors allows a global comparison of quantitative 
enrichments in PPI patterns for both ATXN1 isoforms 
(Supplementary Figure S6). When sampling for the 
Lkbl signaling pathway and the GO-terms 'growth 
cone' and 'learning', we found similar ROC performances 
for both mutant and wild-type ATXN1 baits. For Lkbl 
gene associations, for example, ratio poo | ROC AUC values 
were in the same range for ATXN1-Q32 and ATXN1-Q79 
(0.626 and 0.64). Likewise, for most other GO-terms 
investigated (not shown), sampling reflects a similar 
distribution of classifiers among the scores. In a further 
attempt to quantify the enrichments, we sampled the Y2H 
scores with two 'molecular function' GO-associations, 
'protein domain specific binding' and 'phosphoprotein 
binding' (Supplementary Table S6). Here, ROC perform- 
ances show a selective association of 'phosphoprotein 
binding' with the expanded ATXN1-Q79 form, while for 
'protein domain specific binding', a similar result for 
wild-type and mutant ATXN1 proteins was obtained. In 
conclusion, GO term enrichments and individual 
samplings revealed that the overall PPI pattern of 
ATXN1 is similar for the Q32 and Q79 forms. This indi- 
cates that enhancement of wild-type protein binding de- 
termines pathogenesis of ATXN1 on polyglutamine 
expansion, as opposed to pathogenesis being due to 
binding the wrong partners. 



DISCUSSION 

We describe a novel approach for the detection of high 
quality Y2H PPIs using DNA microarrays and quantita- 
tive statistics. The concept study presented here takes full 
advantage of the established tools for the analysis of DNA 
microarray data and could have important implications 
on how future research on protein interactomes is being 
conducted. 

We concentrated our proof-of-principle experiments on 
the HTT and ATXN1 proteins, which are both neurotoxic 
on polyglutamine repeat expansion (18). The approach 
was validated by the generation of a set of high-confidence 
PPIs for the HTT protein, which were based on micro- 
array data after multiple testing for significance. These 
results were benchmarked against sets of known positive 
PPIs using a quantitative sampling strategy. F-statistics 
based on precision-recall distributions was used to deter- 
mine automated cutoffs for high-confidence interactions. 
PPIs were further restricted by applying two distinct 



background controls (pool and vector), which allows the 
simultaneous selection of Y2H positives and the filtering 
of unspecific autoactivators. Notably, almost two-thirds 
of the final high-confidence PPIs for a HTT bait protein 
were known positives or validated by a modified 
LUMIER assay. Hence, by using quantitative benchmark- 
ing and F-statistics, we established a microarray-based 
Y2H screening method for the high-confidence mapping 
of PPI networks. However, we also advocate that results 
may be interpreted with different procedures, depending 
on the overall screening performance, the availability of 
sets of known positives and also on the specific aims 
intended by individual researchers (see Supplementary 
information). 

Besides the mapping of individual high-confidence 
PPIs, microarray Y2H screening data can be more 
broadly interpreted for enrichments of pathways and 
functional associations. This may be important when ad- 
dressing biological consequences of mutations that alter 
structural properties in proteins and thus underlie global 
perturbations in PPI networks and potentially influence 
the outcome of disease (39). Specifically, we addressed 
here potential differences in PPI patterns between 
protein isoforms (ATXN1-Q32 and ATXN1-Q79, con- 
taining short and expanded polyQ tracts). In this assay, 
ATXN1-Q79 exhibits more and stronger Y2H interactions 
than ATXN1-Q32. On the other hand, our data analysis 
also shows that the overall PPI patterns of wild-type and 
mutant ATXN1 are not radically different, suggesting that 
ATXN1 pathology results from abnormally strong inter- 
actions with its biological partners. Although resulting 
from a screening effort in a heterologous system (yeast), 
this finding is consistent with previously observed effects 
of expanded polyQ tracts in ATXN1 and other polyQ 
disease proteins (20,21,37). This example demonstrates 
how microarray-based Y2H procedures can be used in 
conjunction with extensive data-mining strategies to 
predict the biological consequences of altered proteins. 

While DNA microarrays were used to address Y2H 
results in an earlier study (40), a quantitative procedure, 
such as the one presented here with large-scale pooling of 
a prey library, unbiased selection by competitive growth 
and systematic control measurements, was not attempted 
before. This approach has two major advantages over 
matrix-based Y2H screenings. First, PPIs are 
characterized as scores with different parameters (ratios, 
P-values, etc.) over a wide dynamic range, instead of being 
simple counts from identifications in replicate screens. 
Repetitive sampling strategies and the application of two 
background controls (pool and pBTM comparisons) have 
the important consequence that potential false-positive 
interactions can be addressed and eliminated (see 
Supplementary information for discussion of false 
positive interactions). Because false positives are some- 
times estimated up to 50% of all reported interactions 
(41,42), their minimization would constitute a major ad- 
vantage for mapping of high-confidence PPIs, reducing 
also the need for confirmation with orthogonal assays. 
Second, smaller volumes of medium for yeast mating 
and selection as well as the efficient readout provided by 
DNA microarrays greatly reduce labor and material costs. 
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Simplifying the screening procedure increases potential 
throughput, and therefore larger numbers of Y2H 
screens can be performed in parallel. However, while our 
system is superior over the 'classical' Y2H method with 
respect of quantitative measurements, it has also some 
limitations. First, 4 color'-based scoring of interactions 
via lacZ activation is not possible for the pool-based 
screening scheme. Second, some ORFs may not undergo 
proper PCR amplification, which could lead to a fraction 
of putative PPIs that are undetectable in microarray- Y2H 
assays. Indeed, a bias against longer DNA sequences is 
evidenced by the lesser representation of ORFs >2kb in 
sizes on the microarray (Supplementary Figure S2). Third, 
prey proteins in the complex pool that occur as different 
isoforms or with individual mutations may be indistin- 
guishable on the DNA microarray. Hence, for optimal 
coverage of potential PPIs, DNA microarray and 
matrix-based robotic Y2H procedures should be envi- 
sioned as complementary approaches. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Information and Methods, Supplementary 
Tables 1-7, Supplementary Figures 1-6 and Supplementary 
References [43-50]. 
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